GLS_REGRESSION

Overview

The GLS_REGRESSION function fits a Generalized Least Squares (GLS) regression model, an extension of ordinary least squares (OLS) designed to handle situations where the error terms exhibit non-constant variance (heteroscedasticity) or are correlated with each other (autocorrelation). GLS was first described by Alexander Aitken in 1935 and remains a fundamental technique in econometrics and time series analysis.

In standard OLS regression, the Gauss-Markov theorem guarantees that the estimator is the best linear unbiased estimator (BLUE) when errors are independent and identically distributed. However, when these assumptions are violated, OLS estimates remain unbiased but are no longer efficient, and standard errors become unreliable. GLS addresses this by incorporating a known error covariance matrix \Omega into the estimation process.

The GLS estimator minimizes the squared Mahalanobis distance of the residual vector:

\hat{\beta}_{GLS} = (X^T \Omega^{-1} X)^{-1} X^T \Omega^{-1} y

This is equivalent to applying OLS to a transformed version of the data. Using Cholesky decomposition \Omega = CC^T, the model is whitened by multiplying both sides by C^{-1}, which standardizes the scale and decorrelates the errors. The resulting estimator is unbiased, consistent, efficient, and asymptotically normal with covariance:

\text{Cov}[\hat{\beta} | X] = (X^T \Omega^{-1} X)^{-1}

When \Omega is diagonal, GLS reduces to weighted least squares (WLS), which handles heteroscedasticity without correlation between observations. This implementation uses the statsmodels library’s GLS class. The function returns coefficient estimates, standard errors, t-statistics, p-values, confidence intervals, and model fit statistics including log-likelihood, AIC, and BIC.

For more details on the theoretical background, see the Wikipedia article on Generalized Least Squares and the statsmodels GLS documentation.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=GLS_REGRESSION(y, x, sigma, fit_intercept, alpha)
  • y (list[list], required): Column vector of dependent variable values (n x 1 array)
  • x (list[list], required): Matrix of independent variables (n x p array where p is the number of predictors)
  • sigma (list[list], optional, default: null): Error covariance matrix (n x n array). Defaults to identity matrix if not provided.
  • fit_intercept (bool, optional, default: true): If true, adds an intercept term to the model
  • alpha (float, optional, default: 0.05): Significance level for confidence intervals (between 0 and 1)

Returns (list[list]): 2D list with GLS results, or error message string.

Examples

Example 1: Demo case 1

Inputs:

y x
1 1
2 2
3 3
4 4

Excel formula:

=GLS_REGRESSION({1;2;3;4}, {1;2;3;4})

Expected output:

"non-error"

Example 2: Demo case 2

Inputs:

y x fit_intercept
2 1 false
4 2
6 3
8 4

Excel formula:

=GLS_REGRESSION({2;4;6;8}, {1;2;3;4}, FALSE)

Expected output:

"non-error"

Example 3: Demo case 3

Inputs:

y x sigma
1.5 1 1 0.5 0 0
2.5 2 0.5 1 0.5 0
3.5 3 0 0.5 1 0.5
4.5 4 0 0 0.5 1

Excel formula:

=GLS_REGRESSION({1.5;2.5;3.5;4.5}, {1;2;3;4}, {1,0.5,0,0;0.5,1,0.5,0;0,0.5,1,0.5;0,0,0.5,1})

Expected output:

"non-error"

Example 4: Demo case 4

Inputs:

y x fit_intercept alpha
3 1 0.5 true 0.1
5 2 1
7 3 1.5
9 4 2

Excel formula:

=GLS_REGRESSION({3;5;7;9}, {1,0.5;2,1;3,1.5;4,2}, TRUE, 0.1)

Expected output:

"non-error"

Python Code

import math
import numpy as np
from statsmodels.regression.linear_model import GLS as statsmodels_gls

def gls_regression(y, x, sigma=None, fit_intercept=True, alpha=0.05):
    """
    Fits a Generalized Least Squares (GLS) regression model.

    See: https://www.statsmodels.org/stable/generated/statsmodels.regression.linear_model.GLS.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        y (list[list]): Column vector of dependent variable values (n x 1 array)
        x (list[list]): Matrix of independent variables (n x p array where p is the number of predictors)
        sigma (list[list], optional): Error covariance matrix (n x n array). Defaults to identity matrix if not provided. Default is None.
        fit_intercept (bool, optional): If true, adds an intercept term to the model Default is True.
        alpha (float, optional): Significance level for confidence intervals (between 0 and 1) Default is 0.05.

    Returns:
        list[list]: 2D list with GLS results, or error message string.
    """
    def to2d(val):
        return [[val]] if not isinstance(val, list) else val

    def validate_2d_array(arr, name):
        if not isinstance(arr, list):
            return f"Invalid input: {name} must be a 2D list."
        if len(arr) == 0:
            return f"Invalid input: {name} cannot be empty."
        for row in arr:
            if not isinstance(row, list):
                return f"Invalid input: {name} must be a 2D list."
        return None

    def to_numpy(arr_2d, name):
        try:
            arr = np.array(arr_2d, dtype=float)
        except Exception:
            return f"Invalid input: {name} must contain numeric values."
        if np.any(np.isnan(arr)) or np.any(np.isinf(arr)):
            return f"Invalid input: {name} must contain finite values."
        return arr

    # Normalize inputs
    y = to2d(y)
    x = to2d(x)
    if sigma is not None:
        sigma = to2d(sigma)

    # Validate inputs
    err = validate_2d_array(y, "y")
    if err:
        return err
    err = validate_2d_array(x, "x")
    if err:
        return err

    # Convert to numpy arrays
    y_arr = to_numpy(y, "y")
    if isinstance(y_arr, str):
        return y_arr
    x_arr = to_numpy(x, "x")
    if isinstance(x_arr, str):
        return x_arr

    # Validate dimensions
    if y_arr.ndim != 2 or y_arr.shape[1] != 1:
        return "Invalid input: y must be a column vector (n x 1)."
    n = y_arr.shape[0]
    if x_arr.ndim != 2:
        return "Invalid input: x must be a 2D matrix."
    if x_arr.shape[0] != n:
        return "Invalid input: y and x must have the same number of rows."

    # Handle sigma
    if sigma is not None:
        err = validate_2d_array(sigma, "sigma")
        if err:
            return err
        sigma_arr = to_numpy(sigma, "sigma")
        if isinstance(sigma_arr, str):
            return sigma_arr
        if sigma_arr.ndim != 2:
            return "Invalid input: sigma must be a 2D matrix."
        if sigma_arr.shape[0] != n or sigma_arr.shape[1] != n:
            return f"Invalid input: sigma must be {n} x {n} to match the number of observations."
    else:
        sigma_arr = None

    # Validate alpha
    if not isinstance(alpha, (int, float)):
        return "Invalid input: alpha must be a number."
    if math.isnan(alpha) or math.isinf(alpha):
        return "Invalid input: alpha must be finite."
    if alpha <= 0 or alpha >= 1:
        return "Invalid input: alpha must be between 0 and 1."

    # Validate fit_intercept
    if not isinstance(fit_intercept, bool):
        return "Invalid input: fit_intercept must be a boolean."

    # Add intercept if requested
    if fit_intercept:
        x_arr = np.hstack([np.ones((n, 1)), x_arr])

    # Flatten y for statsmodels
    y_vec = y_arr.flatten()

    # Fit GLS model
    try:
        model = statsmodels_gls(y_vec, x_arr, sigma=sigma_arr)
        results = model.fit()
    except Exception as exc:
        return f"statsmodels.regression.linear_model.GLS error: {exc}"

    # Extract results
    try:
        params = results.params
        std_err = results.bse
        t_stats = results.tvalues
        p_values = results.pvalues
        conf_int = results.conf_int(alpha=alpha)

        # Build output table
        output = [['parameter', 'coefficient', 'std_error', 't_statistic', 'p_value', 'ci_lower', 'ci_upper']]

        # Parameter names
        if fit_intercept:
            param_names = ['intercept'] + [f'x{i+1}' for i in range(x_arr.shape[1] - 1)]
        else:
            param_names = [f'x{i+1}' for i in range(x_arr.shape[1])]

        # Add parameter rows
        for i, name in enumerate(param_names):
            output.append([
                name,
                float(params[i]),
                float(std_err[i]),
                float(t_stats[i]),
                float(p_values[i]),
                float(conf_int[i, 0]),
                float(conf_int[i, 1])
            ])

        # Add model statistics
        output.append(['log_likelihood', float(results.llf), '', '', '', '', ''])
        output.append(['aic', float(results.aic), '', '', '', '', ''])
        output.append(['bic', float(results.bic), '', '', '', '', ''])

    except Exception as exc:
        return f"Error extracting results: {exc}"

    return output

Online Calculator